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Abstract 

We present a parallelizable algorithm for computing the persistent homology of a filtered 
chain complex. Our approach differs from the commonly used reduction algorithm by first 
computing persistence pairs within local chunks, then simplifying the unpaired columns, and 
finally applying standard reduction on the simplified matrix. The approach generalizes a tech- 
nique by Gunther et al., which uses discrete Morse Theory to compute persistence; we derive 
the same worst-case complexity bound in a more general context. The algorithm employs 
several practical optimization techniques which are of independent interest. Our sequential 
implementation of the algorithm is competitive with state-of-the-art methods, and we improve 
the performance through parallelized computation. 

1 Introduction 

Persistent homology has developed from a theoretical idea to an entire research area within 
the field of computational topology. One of its core features is its multi-scale approach to 
analyzing and quantifying topological features in data. Recent examples of application areas 
are shape classification [2], topological denoising [1], or developmental biology [7]. 

A second major feature of persistent homology is the existence of a simple yet efficient 
computation method: The standard reduction algorithm as described in [8, 17] computes the 
persistence pairs by a simple sequence of column operations; Algorithm 1 gives a complete 
description in just 10 lines of pseudo-code. The worst-case complexity is cubic in the input 
size of the complex, but the practical behavior has been observed to be closer to linear on 
average. Various variants have been proposed in order to improve the theoretical bounds [14, 
3] or the practical behavior [5]. 

Our first contribution consists of two simple optimization techniques of the standard re- 
duction algorithm, which we call clearing and compression. Both approaches exploit the 
special structure of a filtered chain complex in order to significantly reduce the number of 
operations on real-world instances. However, the two methods cannot be easily combined 
because they require the columns of the boundary matrix to be processed in different orders. 
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Our second contribution is a novel algorithm that incorporates both of the above opti- 
mization techniques, and is also suitable for parallelization. It proceeds in three steps: In the 
first step, the columns of the matrix are partitioned into consecutive chunks. Each chunk is 
reduced independently, applying the clearing optimization mentioned above. In this step, the 
algorithm finds at least all persistence pairs with (index) persistence less than the size of the 
smallest chunk; let g be the number of columns not paired within this first step. In the second 
step, the g unpaired columns are compressed using the method mentioned above. After com- 
pression, each column has at most g non-zero entries and the unpaired columns form a nested 
(g x g)-matrix. In the third and final step, this nested matrix is reduced, again applying the 
clearing optimization. 

The chunk algorithm is closely related to two other methods for computing persistence. 
First of all, the spectral sequence algorithm [6, §VII.4] decomposes the matrix into blocks 
and proceeds in several phases, computing in phase r the persistence pairs lying r — 1 blocks 
apart. The first step of the chunk algorithm is equivalent to applying the first two phases of the 
spectral sequence approach. Furthermore, the three step chunk algorithm is inspired by the 
approach of Gunther et al. [10], which combines persistence computation and discrete Morse 
theory for 3D image data. The first step of that algorithm consists in constructing a discrete 
gradient field consistent with the input function; such a gradient field can be interpreted as a 
set of persistence pairs that are incident in the complex and have persistence 0. We replace 
this method by local persistence computations, allowing us to find pairs with are not incident 
in the complex. 

We analyze the chunk algorithm in terms of time complexity. Let n be the number of 
generators (simplices, cells) of the chain complex, m the number of chunks, I the maximal 
size of a chunk, and g as above. We obtain a worst-case bound of 

0(m£ 3 + g en+g 3 ), 

where the three terms reflect the worst-case running times of the three steps. For the filtra- 
tion of a cubical complex induced by a J-dimensional grayscale image (with d some fixed 
constant), this bound simplifies to 

0(gn + g 3 ), 

if the chunks are given by the cells appearing simultaneously in the filtration. This bound 
improves on the previous general bound of 0(g 2 nlogn) from [4]. Moreover, it matches the 
bound in [10], but applies to arbitrary dimensions. Of course, the bound is still cubic if 
expressed only in terms of n because g G 0(n) in the worst case. 

We implemented a sequential and a parallelized version of the chunk algorithm; both 
are publicly available in our new PHAT library (http://phat.googlecode.com/). The 
sequential code already outperforms the standard reduction algorithm and is competitive to 
other variants with a good practical behavior. The parallelized version using 12 cores yields 
a speed-up factor between 3 and 11 (depending on the example) in our tests, making the 
implementation the fastest among the considered choices. This is the first result where the 
usefulness of parallelization is shown for the problem of persistence computation through 
practical experiments. 

2 Background 

This section summarizes the theoretical foundations of persistent homology as needed in this 
work. We limit our scope to simplicial homology over Z2 just for the sake of simplicity in the 
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description; our methods generalize to chain complexes over arbitrary fields. 

Homology Homology is an algebraic tool for analyzing the connectivity of topological 
spaces. Let K be a simplicial complex of dimension d. In any dimension p, we call a p- 
chain a formal sum of the ^-simplices of K with Z2 coefficients. The ^-chains form a group 
called the pth chain group C p . The boundary of a ^-simplex a is the (p — l)-chain formed 
by the sum of all faces of a of codimension 1 . This operation extends linearly to a boundary 
operator 8 : C p — > C p -\. A p-ch&m / is a p-cycle if 8 (7) = 0. The ^-cycles form a subgroup 
of the /^-chains, which we call the pth cycle group Z p . A p-cham 7 is called a p-boundary 
if 7 = 8(E,) for some (p + 1) -chain Again, the ^-boundaries form a group B p , and since 
8(8(1;)) = for any chain ^-boundaries are ^-cycles, and so B p is a subgroup of Z p . The 
pth homology group H p is defined as the quotient group Z p /B p . The rank of H p is denoted 
by P p and is called the pth Betti number. In our case of Z2 coefficients, the homology group 

is a vector space isomorphic to Z^, hence it is completely determined by the Betti number. 
Roughly speaking, the Betti numbers in dimension 0, 1, and 2 yield the number of connected 
components, tunnels, and voids of K, respectively. 

Persistence Let {<7i, . . . ,o n } denote the simplices of K. We assume that for each i < n, 
Ki := {ci , . . . , a,} is a simplicial complex again. The sequence of inclusions = Kq C . . . C 
Ki . . . C K n = K is called a simplexwise filtration of K. For every dimension p and every Ki, 
we have a homology group H p (Ki); we usually consider all dimensions at once and write 
H(Kj) for the direct sum of the homology groups of Kj in all dimensions. The inclusion 
Kj ^ Ki + i induces a homomorphism : H(Kj) — > H(Ki + \) on the homology groups. These 
homomorphisms compose and we can define g\ : H(Ki) — > H(Kj) for any i < j. We say that 
a class a G H(Kf>) is born at (index) i if a G img| but a ^ img| j. A class a born at index i 
dies entering (index) j if gj(cc) G img/ ^ but g/ _1 (a) ^ img/lj . In this case, the index pair 
is called a persistence pair, and the difference j — / is the (index) persistence of the pair. 
The transition from to either causes the birth or the death of an homology class. We 
call the added simplex a,- positive if it causes a birth and negative if it causes a death. Note 
that homology classes of the full complex K do not die during the filtration. We call a simplex 
a, that gives birth to such a class essential. All other simplices are called inessential. 

Boundary matrix For a matrix M e Z^", we let Mj denote its j-th column, M' its i- 
th row, and Mj G Z2 its entry in row i and column j. For a non-zero column 7^ Mj < = 
(mi ,...,m n ) G Z'2, we set pivot(M ; ) := max{/ = 1, . . . ,n \ m,- = 1} and call it the pivot index 
of that column. 

The boundary matrix D G (Z2)" xn of a simplexwise filtration (Kfii is a « x n matrix with 
D'j = 1 if and only if a, is a face of Cy of codimension 1. In other words, the y'th column 
of D encodes the boundary of Oj. D is an upper-triangular matrix because any face of a, 
must precede Oj in the filtration. Since the jth row and column of D corresponds to the 
jth simplex a, of the filtration, we can talk about positive columns, negative columns, and 
essential columns in a natural way, and similarly for rows. 

The reduction algorithm A column operation of the form My ^— My +M k is called left- 
to-right if k < j. We call a matrix M' derived from M if M can be transformed into M' by 
left-to-right operations. Note that in a derivation M' of M, the jth column can be expressed as 
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a linear combination of the columns 1, . . . ,j of M, and this linear combination includes Mj. 
We call a matrix R reduced if no two non-zero columns have the same pivot index. If R is 
derived from M, we call it a reduction ofM. In this case, we define 

P R := \ Rj^OAi = V ivot{Rj)} 

E R := {i\Ri = OApivot(Rj)^Nj = \,...,n}. 

Although the reduction matrix R is not unique, the sets Pr and Er are the same for any choice 
of reduction; therefore, we can define Pm and Em to be equal to Pr and Er for any reduction 
R of M. We call the set P the persistence pairs of M. When obvious from the context, we 
omit the subscripts and simply write P for the persistence pairs. For the boundary matrix D 
of K, the pairs (i,j) £ P are the persistence pairs of the filtration (Xi)o</<n> and the indices in 
E correspond to the essential simplices of the complex. Note that E is uniquely determined 
by P and n as the indices between 1 and n that do not appear in any pair of P. 

The simplest way of reducing D is to process columns from left to right; for every column, 
other columns are added from the left until the pivot index is unique (Algorithm 1). A lookup 
table can be used to identify the next column to be added in constant time. A flag is used 
for every column denoting whether a persistence pair with the column index has already been 
found. After termination, the unpaired columns correspond to the essential columns. The 
running time is at most cubic in n, and this bound is actually tight for certain input filtrations, 
as demonstrated in [16]. 

Algorithm 1 Left-to-right persistence computation 
l: procedure Persistence_left_right(D) 

2: J?<-D;L<-[O,...,O];P<-0 >LgZ" 

3: for j = 1,. . . ,n do 

4: while Rj ^ and L[pivot(i?/)] ^ do 

5: R j ^ R j + R L{piwt(Rj)] 

6: ifi^^Othen 

7: i <r- pivot(Rj) 

8: L[l] <- j 

9: Mark columns i and j as paired and add (z, j) to P 

10: return P 



Let M be derived from D. A column Mj of M is called reduced if either it is zero, or if 
(i,j) G P with i = pivot(M 7 ). With this definition, a matrix M is a reduction of D if and only 
if every column is reduced. 

3 Speed-ups 

Algorithm 1 describes the simplest way of reducing the boundary matrix, but it performs 
more operations than actually necessary to compute the persistence pairs. We now present 
two simple techniques which both lead to a significant decrease in the number of required 
operations. 

Clearing positive columns The key insight behind our first optimization is the following 
fact: if i appears as the pivot in a reduced column of M, the index i is positive and hence there 
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exists a sequence of left-to-right operations on M ; - that turn it to zero. Instead of explicitly 
executing this sequence of operations, we define the clear operation by setting column M, to 
zero directly. Informally speaking, a clear is a shortcut to avoid some column operations in 
the reduction when the result is evident. 

In order to apply this optimization, we change the traversal order in the reduction by first 
reducing the columns corresponding to simplices with dimension d (from left to right), then all 
columns with dimension d — 1 , and so on. After having reduced all columns with dimension 
8, we have found all positive inessential columns with dimension 5 — 1 and clear them before 
continuing with 5 — 1 . This way all positive inessential columns of the complex are cleared 
without performing any column additions on them. See [4] for a more detailed description. 

Compression Alternatively, we can try to save arithmetic operations by reducing the num- 
ber of non-zero rows among the unpaired columns. A useful observation in this context is 
given next. 

Lemma 1. Let Mj be a non-zero column ofM with i = pivot (My). Then Mi is a positive and 
inessential column. 

Proof. The statement is clearly true if My is reduced, because in this case ( i, j) is a persistence 
pair. If Mj is not reduced, this means that after applying some sequence of left-to-right column 
operations, some reduced column has i as pivot index. □ 

Corollary 2. Let Mi be a negative column ofM. Then i is not the pivot index of any column 
inM. 

As a consequence, whenever a negative column with index j has been reduced, row j can 
be set to zero before further reducing. 

Corollary 3. Let Mi be a negative column and let Mj be a column with Mj = 1. Then setting 
M'j to zero does not affect the pairs. 

We can even do more: let i be the pivot index of the reduced column Mj and assume that 
the submatrix of M with column indices {1, . . . , j} and row indices {/, ...,«} is reduced, i.e., 
the pivot indices are unique in this submatrix. By adding column j to each unreduced column 
in the matrix that has a non-zero entry at row i, we can eliminate all non-zero entries in row i 
from the unreduced columns. Note that if k < j and M l k / 0, then pivot (M^) > i and thus, by 
assumption, M* must be a reduced negative column. Therefore, for each unreduced column 
Mfc, the operation M^ <— M^ +Mj is a left-to-right addition and thus does not affect the pairs. 

4 Reduction in chunks 

The two optimization techniques from Section 3 both yield significant speed-ups, but they are 
not easily combinable, because clearing requires to process a simplex before its faces, whereas 
compression works in the opposite direction. In this section, we present an algorithm which 
combines both optimization techniques. 

Let m G N. Fix m + 1 numbers = to < h < ■ ■ ■ < t m -\ <t m = n and define the ith chunk 
of D to be the columns of D with indices {U-\ + 1, ... ,*,■}. We call a column Dj local if it 
forms a persistence pair with another column in the same chunk or in one of the adjacent 
chunks. In this case, we also call the persistence pair local. Non-local columns (and pairs) are 
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called global. If I is a lower bound on the size of each chuck, then every global persistence 
pair has index persistence at least i. We also call an index j local if the jth column of D 
is local, and the same for global. We denote the number of global columns in D by g. The 
high-level description of our new algorithm consists of three steps: 

1 . Partially reduce every chunk independently, applying the clearing optimization, so that 
all local columns are completely reduced. 

2. Independently compress every global column such that all its non-zero entries are global. 

3. Reduce the submatrix consisting only of the global rows and columns. 

We give details about the three steps in the rest of this section. The first two steps can be 
performed in parallel, whereas the third step only needs to reduce a matrix of size g x g 
instead of n x n. In many situations, g is significantly smaller than n. 

Local chunk reduction The first step of our algorithm computes the local pairs by per- 
forming two phases of the spectral sequence algorithm [6, §VII.4]. Concretely, we apply 
left-to-right operations as usual, but in the first phase we only add columns from the same 
chunk, and in the second phase we only add columns from both the same chunk and its left 
neighbor. After phase r, for each b G {r, . . . , m} the submatrix with column indices { 1 , . . . 
and row indices {tb-i + 1 ,...,«} is reduced. If the reduction of column j stops at a pivot index 
i > tj,_ r , row j cannot be reduced any further by adding any column, so we identify as 
a local persistence pair. Conversely, any local pair (i, j) is detected by this method after two 
phases. We incorporate the clearing operation for efficiency, that is, we proceed in decreasing 
dimension and set detected local positive columns to zero; see Algorithm 2. After its exe- 
cution, L[i] contains the index of the local negative column with pivot index i for any local 
positive column i, and the resulting matrix R is a derivation of D in which all local columns 
are reduced. 

Algorithm 2 Local chunk reduction 
l: procedure Local_reduction(M, f , ...,t m ) 



2: R<-M;L<-[0,...,0];P<-Q >LeZ n 

3: for 8 =J,...,0do 

4: for r = 1 , 2 do > Perform two phases of the spectral sequence algorithm 

5: for fe = r,...,mdo > Loop is parallelizable 

6: for j = tb-\ + 1, . . . ,tb with dimCj = <5 do 

7: if j is not marked as paired then 

8: while Rj ^ A L [pivot (/?_/)] ^ A pivot(fl,-) > t b - r do 

9: R j ^ R j + R L[pivot(j)] 

10: ifAj-^Othen 

11: i<- pivot (Rj) 

12: if i > tb- r then 

13: L[l\ <- j 

14: Rj «— > Clear column i 

15: Mark i and j as paired and add (/, j) to P 

16: return (R,L,P) 
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Algorithm 3 Determining active entries 



l: procedure Mark_active_entries(7?) 

2: for each unpaired column k do 
3: Mark column(7?, k) 

4: function Mark columnar, k) 

5: if is marked as active/inactive then return true/false 

6: for each non-zero entry index i of do 

7: if £ is unpaired then 

8: mark k as active and return true 

9: else if i is positive then 

10: j <- L[l\ 

11: if j ^ k and Mark_column(7?, j) then 

12: mark k as active and return true 

13: mark k as inactive and return false 



> (i, j) is persistence pair 



> Loop is parallelizable 



Global column compression Let R be the matrix returned by Algorithm 2. Before com- 
puting the global persistence pairs, we first compress the global columns, using the ideas from 
Section 3; recall that negative rows can simply be set to zero, while entries in positive rows 
can be eliminated by an appropriate column addition. Note, however, that a full column ad- 
dition might actually be unnecessary: for instance, if all non-zero row indices in the added 
column belong to negative columns (except for the pivot), the entry in the local positive row 
could just have been zeroed out in the same way as in Corollary 3. Speaking more generally, 
it is more efficient to avoid column additions that have no consequences for global indices, 
neither directly nor indirectly. 

In the spirit of this observation, we call an index i inactive if either it is a local negative 
index or if (i, j) is a local pair and all indices of non-zero entries in column Rj apart from i are 
inactive. Otherwise, the index is called active. By induction and Corollary 3, we can show: 

Lemma 4. Let i be an inactive index and let Mj be any column with Mj = 1. Then setting Mj 
to zero does not affect the persistence pairs. 

The compression proceeds in two steps: first, every non-zero entry of a global column is 
classified as active or inactive (using depth-first search; see Algorithm 3). Then, we iterate 
over the global columns, set all entries with inactive index to zero, and eliminate any non-zero 
entry with a local positive index I by column addition with L[l\ (see Algorithm 4). After 
this process, we obtain a matrix R' with the same persistence pairs as R, such that the global 
columns of R' have non-zero entries only in the global rows. 

Submatrix reduction After having compressed all global columns, these form a g x g 
matrix "nested" in R (recall that g is the number of global columns). To complete the compu- 
tation of the persistence pairs, we simply perform standard reduction on the remaining matrix. 
For efficiency, we perform steps 2 and 3 alternatingly for all dimensions in decreasing order 
and apply the clearing optimization; this way, we avoid the compression of positive global 
columns. Algorithm 5 summarizes the whole method. 



7 



Algorithm 4 Global column compression 



l: procedure Compressor) 

2: Uses variables: L 

3: for each non-zero entry index £ of R^ in decreasing order do 

4: if £ is paired then 

5: if £ is inactive then 

6: R\ <- 

7: else 

8: j -f- L[£] > (£, j) is persistence pair 

9: R k ^R k + Rj 



Algorithm 5 Persistence in chunks 
l: procedure Persistence_in_chunks(D, to, ...,t m ) 

2: (R, L, P) ^— Local reduction(D, to,...,t m ) > step 1 : reduce local columns 

3: MARK_ACTIVE_ENTRIES(/?) 

4: for 8 = d, . . . ,0 do 

5: > step 2: compress global columns 

6: for j = 1 , . . . , n with dim Oj = 8 do > Loop is parallelizable 

7: if column j is not paired then 

8: Compressor, y - ) 

9: for j = 1, . . . , n with dimcr, = 8 do > step 3: reduce global columns 

10: while Rj ^ A L[pivot(/?y)] ^ do 

11: R j ^ R j + R L[pivot(j)] 

12: ifi? 7 ^0then 

13: i <- pivot(i?y) 

14: L[i] <- j 

15: i?,- <— > Clear column z' 

16: Mark ?' and j as paired and add (i, j) to P 

17: return P 



5 Analysis 

Algorithm 5 permits a complexity analysis depending on the following parameters: n, the 
number of simplices; m, the number of chunks; I, the maximal size of a chunk; and g, the 
number of global columns. We assume that for any simplex, the number of non-zero faces of 
codimension 1 is bounded by a constant (this is equivalent to assuming that the dimension of 
the complex is a constant). 

General complexity We show that the complexity of Algorithm 5 is bounded by 

0(m£ 3 +g£n + g 3 ). (1) 
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The three summands correspond to the running times of the three steps 1 . Note that g G 0(n) 
in the worst case. 

For the complexity of Algorithm 2, we consider the complexity of reducing one chunk, 
which consists of up to i columns. Within the local chunk reduction, every column is only 
added with columns of the same or the previous chunk, so there are only up to 21 column 
additions per column. Moreover, since the number of non-zero entries per column in D is 
assumed to be constant, there are only 0(£) many entries that can possibly become non-zero 
during the local chunk reduction. It follows that the local chunk reduction can be considered 
as a reduction on a matrix with £ columns and 0(1) rows. If we represent columns by linked 
lists (containing the non-zero indices in sorted order), one column operation can be done in 
0(£) primitive operations, which leads to a total complexity of 0(£ 3 ) per chunk. 

The computation of active columns in Algorithm 3 is done by depth-first search on a graph 
whose vertices are given by the columns and whose edges correspond to their non-zero entries. 
The number of edges is 0(n£), so we obtain a running time of 0(n£). 

Next, we consider the cost of compressing a global column with index j. After the pre- 
vious step, the column has at most 0{£) non-zero entries. We transform the presentation of 
the column from a linked list into a bit vector of size n. In this representation, adding another 
column in list representation with v entries to column j takes time proportional to v. In the 
worst case, we need to add all columns with indices 1 ,...,_/— 1 to j. Each such column has 
0(1) entries. At the end, we transform the bit vector back into a linked list representation. 
The total cost is 0(n + (j—l)£ + n) = 0(n£) per global column. 

Finally, the complexity of the global reduction is 0(g 3 ), as in the standard reduction. 

Choosing chunks We discuss different choices of chunk size and their complexities. A 
generic choice for an arbitrary complex is to choose 0(y/n) chunks of size 0(y/n) each. With 
that, the complexity of (1) becomes 

0(n 2 + gl n^h~ + g\). 
Alternatively, choosing O(j^) chunks of size 0(\ogn), the complexity becomes 

0(n log 2 n + g 2 n log n + g\) ■ 

We replaced g by gi and g2 to express that the number of global columns is different in both 
variants. In general, choosing larger chunks is likely to produce less global columns, since 
every global persistence pair has index persistence at least i (the size of the smallest chunk) . 

Cubical complexes We consider an important special case of boundary matrices: consider 
a ^-dimensional image with p hypercubes, where each vertex contains a grayscale value. We 
assume that the cubes are triangulated conformally in order to get simplicial input - the argu- 
ment also works, however, for the case of cubical cells. We assign function values inductively, 
assigning to each simplex the maximal value of its faces. Assuming that all vertex values are 
distinct, the lower star of vertex v is the set of all simplices which have the same function 
value as v. Filtering the simplices in a way that respects the order of the function values, 
we get a lower star filtration of the image. Now choose the lower stars as the chunks in 

^he running time of the third step could be lowered to g m , where 0) is the matrix-multiplication exponent, using 
the method of [14]. 
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our reduction algorithm. Note that the lower star is a subset of the star of the corresponding 
vertex, which is of constant size (assuming that the dimension d is constant). Therefore, the 
complexity bound (1) reduces to 

0{n+gn + g 3 ) = 0{gn+g i ). 

Note that global columns with large index persistence might still have very small, or even 
zero, persistence with respect to the function values, for instance in the presence of a flat 
region in the image where many vertices have similar values. 

6 Experiments 

We implemented two versions of the algorithm presented in Section 4: a sequential and 
a parallel version (using OpenMP), in which the first two steps of the algorithm are per- 
formed simultaneously on each chunk and on each global column, respectively. In both cases, 
we use as the chunk size. For a fair comparison, we also re-implemented the algo- 

rithms introduced in [4, 8] in the same framework, that means, using the same data repre- 
sentations and low-level operations such as column additions. Our implementation is pub- 
licly available in our new PHAT library for computing persistence homology, available at 
http : //phat . googlecode . com/. Additionally, we compare to the memory efficient algo- 
rithm [10] based on discrete Morse theory [9] and to the implementation of the persistent 
cohomology algorithm [5] found in the DIONYSUS library [15]. 

To find out how these algorithms behave in practice, we apply them to five representative 
data sets. The first three are 3D image data sets with a resolution of 128 3 . The first of 
these is given by a Fourier sum with random coefficients and is representative of smooth 
data. The second is uniform noise. The third is the sum of the first two and represents large- 
scale structures with some small-scale noise. These data sets are illustrated in Figure 1 by an 
isosurface. 




Figure 1 : A single isosurface of the representative data sets used to benchmark and compare our 
algorithm: a) a smooth data set, b) uniformly distributed noise, c) the sum of a) and b). 

In addition to the lower star filtrations of these image data sets, we also consider an alpha 
shape filtration defined by 10000 samples of a torus embedded in R 3 , and the 4-skeleton of 
the Rips filtration given by 50 points randomly chosen from the Mumford data set [1 1]. 

As pointed out in [5] , the pairs of persistent cohomology are the same as those of persis- 
tent homology. We therefore also applied all algorithms to the corresponding cochain filtra- 
tion, given in matrix form by transposing the boundary matrix and reversing the order of the 
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columns; this operation is denoted by (■) J -. When reducing such a coboundary matrix with 
the clearing optimization, columns are processed in order of increasing instead of decreasing 
dimension. 

Table 1 contains the running times of the above algorithms applied to filtrations of these 
five data sets run on a PC with two Intel Xeon E5645 CPUs. We can observe a huge speed-up 
caused by the clearing optimization as already reported in [4]. We can see that our chunk 
algorithm performs slightly worse than the one of [4] when executed sequentially, but faster 
when parallelized. We also observe that standard, twist, and chunk generally behave worse 
when computing persistent cohomology, except for the Rips filtration of the Mumford data 
set, where the converse is true. 



Dataset 


n-lO" 6 


std. [8] 


twist [4] 


cohom. [5] 


DMT [10] 


g/n 


chunk (lx) 


chunk (12x) 


Smooth 
Smooth^ 


16.6 
16.6 


383s 
432s 


3.1s 
11.3s 


65.8s 
20.8s 


2.0s 


0% 
0% 


5.0s 
6.3s 


0.9s 
0.9s 


Noise 
Noise^ 


16.6 
16.6 


336s 
1200s 


17.2s 
29.0s 


15971s 
190.1s 


13.0s 


9% 
9% 


28.3s 
31.1s 


6.3s 
5.8s 


Mixed 
Mixed^ 


16.6 
16.6 


330s 
446s 


5.8s 
13.0s 


50927s 
32.7s 


12.3s 


5% 
5% 


21.6s 
32.0s 


2.4s 
2.9s 


Torus 
Torus^ 


0.6 
0.6 


52s 
24s 


0.3s 
0.3s 


1.6s 
1.4s 




7% 
7% 


0.3s 
0.9s 


0.1s 
0.2s 


Mumford 
Mumford^ 


2.4 
2.4 


38s 
58s 


35.2s 
0.2s 


2.8s 
184.1s 




82% 
82% 


14.6s 
1.5s 


1.8s 
0.4s 



Table 1 : Running time comparison of various persistent homology algorithms applied to the data 
sets described in Section 6. The last three columns contain information of the algorithm presented 
in this paper: the fraction of global columns g/n, and the running times using one and twelve 
cores, respectively. 



7 Conclusion and Outlook 

We have presented an algorithm for persistent homology that includes two simple optimization 
techniques into the reduction algorithm. It can be fully parallelized, except for the reduction 
of compressed global columns, whose number is often small. Besides our asymptotic com- 
plexity bounds, which give a detailed dependence on the parameters of the algorithm, our 
experiments show that significant speed-ups can be achieved through parallelized persistence 
computation. Similar observations have been made recently by Lewis and Zomorodian [12] 
for the computation of (non-persistent) homology; see also [13]. We plan a more extensive 
discussion of the practical effects of our optimizations and parallelization in an extended ver- 
sion of this paper. 

Acknowledgements The authors thank Chao Chen, Herbert Edelsbrunner, and Hubert 
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